We use the following packages

library(mice)     # Boys dataset
library(tidyverse)# All the good stuff
library(ggplot2)  # Plotting suite (actually included in tidyverse)

Why visualise?

  • We can process a lot of information quickly with our eyes
  • Plots give us information about
    • Distribution / shape
    • Irregularities
    • Assumptions
    • Intuitions
  • Summary statistics, correlations, parameters, model tests, p-values do not tell the whole story

Why visualise?

ggplot2

What is ggplot2?

Layered plotting based on the book The Grammar of Graphics by Leland Wilkinsons.

With ggplot2 you

  1. provide the data
  2. define how to map variables to aesthetics
  3. state which geometric object to display
  4. (optional) edit the overall theme of the plot

ggplot2 then takes care of the details

The package is extremely popular and well described.

An example: scatterplot

1: Provide the data

boys |> 
  ggplot()

2: state which geometric object to display

boys |> 
  ggplot() |> 
  geom_point()

3: map variable to aesthetics

boys |> 
  ggplot() +
  geom_point(aes(x = hgt, y = wgt)

An example: scatterplot

Why this syntax?

Create the plot

gg <-  boys |> 
  ggplot() +
  geom_point(aes(x=hgt, y=wgt),
             col="dark green" )

Add another layer (smooth fit line)

gg <- gg + 
  geom_smooth(aes(x = hgt, y = wgt),
              col = "dark blue")

Give it some labels and a nice look

gg <- gg + 
  labs(x = "Age", y = "BMI", title = "BMI trend for boys") +
  theme_minimal()

Why this syntax?

plot(gg)

Aesthetics

  • x
  • y
  • size
  • colour
  • fill
  • opacity (alpha)
  • linetype
  • …

Aesthetics

gg <- boys |>  
  filter(!is.na(reg)) |>  
  ggplot() +
  
  geom_point(aes(x = hgt, 
                 y = wgt, 
                 shape  = reg, 
                 colour = age),
             alpha = 0.5) +
  labs(title  = "Trend for boys",
       x      = "Height", 
       y      = "Weight", 
       shape    = "Region",
       colour = "Age") +
  theme_minimal()

Aesthetics

plot(gg)

Geoms

  • geom_point

  • geom_bar

  • geom_line

  • geom_smooth

  • geom_histogram

  • geom_boxplot

  • geom_density

Geoms: Bar

Geoms: Line

Geoms: Smooth

Geoms: Boxplot

Export figure

Easy with ggsave()

# save as pdf
ggssave("plot.pdf", myplot)

# save as png and specify dimensions 
ggssave("plot.png", myplot, width = 7, height = 5, units="in")

RMarkdown

R Markdown

What is it
* R Markdown is a file format for making dynamic documents

Who is it for
* For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.

  • For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code).

  • As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.

Types of R Markdown output

  • Documents (HTML, PDF, Word, RTF, Markdown)
  • Presentations
    • ioslides (HTML)
    • Beamer (PDF)
    • PowerPoint (PowerPoint)
  • Journals
    • elsevier_article _ jss_article
  • Dashboards
    • Flexdashboard
  • Websites and blogs
    • blogdown
  • Books
    • bookdown (HTML, PDF, ePUB and Kindle books)

Start a new Markdown-document

To create a new document:
- File - New File - RMarkdown
You can create a title and an output-format and then RStudio sets up the basics for you.

R Markdown code

HTML5 Icon

What does R Markdown contains

It contains three important types of content:

  • An (optional) YAML header surrounded by —s.
  • Chunks of R code surrounded by ```.
  • Text mixed with simple text formatting like # heading and italics.

How does R Markdown works

HTML5 Icon









…but just push (ctrl+shift+K) the Knit icon.

YAML

YAML

YAML - “yet another markup language”

  • Controls many “whole document” settings
  • Possible to set document parameters
  • Possible to specify bibliography

RStudio creates a YAML header when starting from scratch with File \(\rightarrow\) New File \(\rightarrow\) R Markdown…

Text

Text

Examples

  • *italic* gives italic

  • **bold** gives bold

  • ~~Strikethrough~~ gives Strikethrough

  • superscript^2^subscript~2~ gives superscript2/subscript2

  • $e^{i\pi}+1=0$ gives \(e^{i\pi}+1=0\)

  • `r nrow(mice::boys)` gives 748

  • Headers:

  • # Level 1 header

  • ## Level 2 header

Lists

* Blah Blah
* Blah

gives

  • Blah Blah
  • Blah

1. Blah Blah
2. Blah

gives

  1. Blah Blah
  2. Blah

Code chunks

Creating a code chunk

To integrate R code into your document you create a code chunk (Short cut Ctrl+Alt+i).
By default it will show both your code and the result from the console. So the following codechunk in your Rmarkdown file:

```{r}
a <- 100
a*2
```

Creating a code chunk

Will be printed in your document as:

a <- 100
a*2
## [1] 200

Creating a code chunk

When writing your Rmarkdown-file you can run each line in your chunk with Ctrl+Enter or the entire chunk with Ctrl+Shift+Enter

Tables - (normal print)

Printing tables to markdown aren’t pretty. But with the package kableExtra it is easy to make readable tabels.

boys |>  
select(-gen, -phb, -tv) |> 
head()
##      age  hgt   wgt   bmi   hc   reg
## 3  0.035 50.1 3.650 14.54 33.7 south
## 4  0.038 53.5 3.370 11.77 35.0 south
## 18 0.057 50.0 3.140 12.56 35.2 south
## 23 0.060 54.5 4.270 14.37 36.7 south
## 28 0.062 57.5 5.030 15.21 37.3 south
## 36 0.068 55.5 4.655 15.11 37.0 south

Tables - knitr::kable

boys |>  
select(-gen, -phb, -tv, -hc) |> 
head() |> 
knitr::kable(format = "html",
col.names = c("Age","Height","Weight","BMI", "Region"),
align = "ccccc",
caption = "The 5 first boys")
The 5 first boys
Age Height Weight BMI Region
3 0.035 50.1 3.650 14.54 south
4 0.038 53.5 3.370 11.77 south
18 0.057 50.0 3.140 12.56 south
23 0.060 54.5 4.270 14.37 south
28 0.062 57.5 5.030 15.21 south
36 0.068 55.5 4.655 15.11 south

Tables - knitr::kable and KableExtra

boys |>  
  select(-gen, -phb, -tv, -hc) |> 
  head() |> 
  knitr::kable(col.names = c("Age","Height","Weight","BMI", "Region"),
               align = "ccccc",
               caption = "The 5 first boys") |>  
  kable_styling(bootstrap_options = c("hover"),
                full_width = FALSE, position = "left") 
The 5 first boys
Age Height Weight BMI Region
3 0.035 50.1 3.650 14.54 south
4 0.038 53.5 3.370 11.77 south
18 0.057 50.0 3.140 12.56 south
23 0.060 54.5 4.270 14.37 south
28 0.062 57.5 5.030 15.21 south
36 0.068 55.5 4.655 15.11 south

Figures

Rmarkdown will notice when a code chunk produces a figure.

boys |> 
  ggplot() +
  geom_point(aes(x = age,
                 y = hgt))

Figures

## Warning: Removed 20 rows containing missing values (`geom_point()`).

Chunk options

Sometimes you do not want to show your code, but only your results. Then you can use a chunk option.
The option that hides the code in the output is called echo

```{r echo = F}
a <- 100
a*2
```
## [1] 200

Chunk options

x marks the thing that the chunk will not do

Good resources to learn more

RMarkdown

ggplot2

  • ggplot2 - The book: The ultimate guide to ggplot written by Hadley Wickham who is also the author to the package. It is quite long, but if you want to truely understand it, this is the place to start.
  • ggplot2 - Cheat sheet: A very useful cheat sheet
  • sf - Maps in ggplot2: sf is a package that extends ggplot2 so it can make maps

Final note

  • If you want to learn more about everything we have looked at the best place to go is probable the online-book R for Datascience.
  • It is free and written by Hadley Wickham, who is the main author behind tidyverse.
  • You can find it here: r4ds.had.co.nz/

Practical